Assessing the Reconstruction of Macro-molecular Assemblies: the Example of the Nuclear Pore Complex
نویسندگان
چکیده
The reconstruction of large protein assemblies is a major challenge due to their plasticity and due to the flexibility of the proteins involved. An emerging trend to cope with these uncertainties consists of performing the reconstruction by integrating experimental data from several sources, a strategy recently used to propose qualitative reconstructions of the Nuclear Pore Complex. Yet, the absence of clearly identified canonical reconstructions and the lack of quantitative assessment with respect to the experimental data are detrimental to the mechanistic exploitation of the results. To leverage such reconstructions, this work proposes a modeling framework inherently accommodating uncertainties, and allowing a precise assessment of the reconstructed models. We make three contributions. First, we introduce toleranced models to accommodate the positional and conformational uncertainties of protein instances within large assemblies. A toleranced model is a continuum of geometries whose distinct topologies can be enumerated, and mining stable complexes amidst this finite set hints at important structures in the assembly. Second, we present a panoply of tools to perform a multi-scale topological, geometric, and biochemical assessment of the complexes associated to a toleranced model, at the assembly and local levels. At the assembly level, we assess the prominence of contacts and the quality of the reconstruction, in particular w.r.t symmetries. At the local level, the complexes encountered in the toleranced model are used to confirm / question / suggest protein contacts within a known 3D template known at atomic resolution. Third, we apply our machinery to the NPC for which we (i) report prominent contacts uncovering sub-complexes of the NPC, (ii) explain the closure of the two rings involving 16 copies of the Y -complex, and (iii) develop a new 3D template for the T -complex. These contributions should prove instrumental in enhancing the reconstruction of assemblies, and in selecting the models which best comply with experimental data. Key-words: Proteins, macro-molecular complexes, structural biology, nuclear pore complex. Union of balls, curved Voronoi diagrams, curved α-shapes, stability, topological persistence, graph matching, maximum common sub-graphs. ∗ INRIA Sophia-Antipolis-Méditerranée, Algorithms-Biology-Structure; [email protected] † INRIA Sophia-Antipolis-Méditerranée, Algorithms-Biology-Structure; [email protected] in ria -0 05 59 11 7, v er si on 2 2 M ar 2 01 1 Evaluation de la Reconstructions de Gros Assemblages Protéiques: l’Exemple du Pore Nucléaire Résumé : La reconstruction de gros assemblages est un challenge majeur en raison de leur plasticité, mais aussi de la flexibilité des protéines impliquées. Une stratégie émergeante pour faire face à ces incertitudes consiste à intégrer des données expérimentales diverses, cette stratégie ayant fait ses preuves pour la reconstruction de modèles qualitatifs du pore nucléaire (NPC), qui est le plus gros complexe protéique connu à ce jour dans la cellule eucaryote. Néanmoins, l’absence d’une part de reconstructions canoniques et d’autre part d’évaluation quantitative de la cohérence des modèles produits avec les données expérimentales utilisées nuit à l’exploitation des résultats. Pour améliorer ces reconstructions, ce travail propose un paradigme de modélisation prenant en compte de façon inhérente les incertitudes, et permettant par ailleurs une évaluation précise des modèles reconstruits. Les contributions présentées sont triples. Tout d’abord, nous introduisons les modèles tolérancés de façon à prendre en compte les incertitudes relatives à la position et à la forme des protéines dans un assemblage. Un modèle tolérancé est un continuum géométrique dont on peut énumérer toutes les topologies possibles, et les régions stables au sein de celles-ci met sont autant d’indices vers des parties potentiellement importantes de l’assemblage. Ensuite, nous présentons une panoplie d’outils d’analyse topologique, géométrique, et biochimique des complexes associés à un modèle tolérancé, à la fois au niveau global et local. Au niveau de l’assemblage, nous évaluons la prégnance des contacts et la qualité de la reconstruction, en particulier vis à vis des symétries. Au niveau local, les complexes observés sont utilisés pour confirmer / infirmer / suggérer de nouveaux contacts au sein d’un template 3D d’un sous-système. Enfin, nous appliquons ces outils au pore nucléaire, pour lequel nous (i) mettons en exergue des contacts prégnants relatifs à plusieurs sous-systèmes, (ii) étudions la fermeture des deux anneaux impliquant 16 copies du complexe Y , et (iii) développons un nouveau template 3D pour le T -complex. De façon générale, nous pensons que ces travaux vont permettre d’une part d’améliorer la reconstruction de gros assemblages, et d’autre part de sélectionner les reconstructions montrant la plus forte cohérence avec les données expérimentales. Mots-clés : Protéines, complexes macro-moléculaires, interfaces, pore nucleaire, biologie structurale. Union de boules, diagrammes de Voronoi courbes, α-shapes courbes, stabilité, persistence topologique, matching de graphes, sous-graphes maximaux. in ria -0 05 59 11 7, v er si on 2 2 M ar 2 01 1 Assessing the Reconstruction of Macro-molecular Assemblies 3 1 Reconstructing Large Macro-molecular Assemblies Reconstruction by data integration. Large protein assemblies such as the Nuclear Pore Complex (NPC), chaperonin cavities, the proteasome or ATP synthases, to name a few, are key to numerous biological functions. To improve our understanding of these functions, one would ideally like to build and animate atomic models of these molecular machines. However, this task is especially tough, due to their size and their plasticity, but also due to the flexibility of the proteins involved. In a sense, the modeling challenges arising in this context are different from those faced for binary docking, and also from those encountered for intermediate size complexes which are often amenable to a processing mixing (cryo-EM) image analysis and classical docking. To face these new challenges, an emerging paradigm is that of reconstruction by data integration [AFK08]. In a nutshell, the strategy is reminiscent from NMR and consists of mixing experimental data from a variety of sources, so as to find out the model(s) best complying with the data. This strategy has been in particular used to propose plausible models of the Nuclear Pore Complex [ADV07a], the largest assembly known to date in the eukaryotic cell, and consisting of 456 protein instances of 30 types. Modeling with uncertainties and model assessment. Reconstruction by data integration requires three ingredients. First, a parametrized model must be adopted, typically a collection of balls to model a protein with pseudo-atoms. Second, as in NMR, a functional measuring the agreement between a model and the data must be chosen. In [ADV07b], this functional is based upon restraints, namely penalties associated to the experimental data. Third, an optimization scheme must be selected. The design of restraints is notoriously challenging, due to the ambiguous nature and/or the noise level of the data. For example, Tandem Affinity Purification (TAP) gives access to a pullout i.e. a list of protein types which are known to interact with one tagged protein type, but no information on the number of complexes or on the stoichiometry of proteins types within a complex is provided. In cryo-EM, the envelope enclosing an assembly is often imprecisely defined, in particular in regions of low density. For immuno-EM labelling experiments, positional uncertainties arise from the microscope resolution. These uncertainties coupled with the complexity of the functional being optimized, which in general is non convex, have two consequences. First, it is impossible to single out a unique reconstruction, and a set of plausible reconstructions must be considered. As an example, 1000 plausible models of the NPC were reported in [ADV07b]. Interestingly, averaging the positions of all balls of a particular protein type across these models resulted in 30 so-called probability density maps, each such map encoding the probability of presence of a particular protein type at a particular location in the NPC. Second, the assessment of all models (individual and averaged) is non trivial. In particular, the lack of straightforward statistical analysis of the individual models and the absence of assessment for the averaged models are detrimental to the mechanistic exploitation of the reconstruction results. At this stage, such models therefore remain qualitative. Contributions. This work tackles the difficulties just discussed, and proposes a modeling framework inherently accommodating uncertainties, and allowing a precise assessment of reconstructed models. We make three contributions. First, we introduce toleranced models to accommodate the positional and conformational uncertainties of protein instances within large assemblies. A toleranced model is a collection of toleranced balls, each such ball consisting of two concentric balls called the inner and outer balls, respectively meant to encode high and low confidence regions. A toleranced model is a one-parameter family of shapes, since growing the radii of toleranced balls results in a continuum of nested geometries, whence accommodating the aforementioned uncertainties. In particular, it is possible to enumerate the finite set of topologies encountered along the growth process, each connected component associated to a given topology being a protein complex. Second, we present a panoply of tools to perform a topological, geometric, and biochemical assessment of the complexes associated to a toleranced model, at the global and local levels. At the global level, a multiscale investigation of the protein complexes involving selected protein types provides information on prominent contacts and on the overall quality of the reconstruction, which is especially useful in the presence of symmetries. At the local level, let a template be a 3D model meant to probe the protein complexes of the toleranced model. We confirm / question / suggest protein contacts of the template based on the prominent contacts seen in the toleranced model,and hint at missing and or ill-placed proteins. Note that these tools can naturally be used to run in-silico experiments aiming at testing hypothesis. RR n° 7513 in ria -0 05 59 11 7, v er si on 2 2 M ar 2 01 1 4 Cazals and Dreyfus Third, we apply our machinery to the NPC, so as to bridge the gap between global yet qualitative models of the whole NPC, and atomic models of sub-complexes. Starting from a toleranced model derived from the probability density maps of Alber et al. [ADV07a], we (i) report prominent contacts uncovering sub-complexes of the NPC, (ii) explain the closure of the two rings involving copies of the Y -complex, in the context of the work by Blobel et al [KB09] and Vetter et al [SSF08], and (iii) develop a new template for the T -complex. Mathematically, our framework elaborates on previous work in computational geometry, computational topology, and graph theory. Tracking the evolution of topological features associated with a collection of growing balls is the seminal contribution of affine α-shape [Ede92], which was later put in the context of Morse theory [GJ03] and topological persistence [CSEH05, CCS11]. In this context, the novelty of our work resides in the introduction of toleranced models, and in the ability to investigate the one-parameter family of shapes defined by the α-shape of an additively-multiplicatively weighted Voronoi diagram [CD10]. Also, the comparison of the contacts within a protein complex and a template is phrased in terms of Maximal Common Induced Subgraph (MCIS) and Maximal Common Edge Sub-graph (MCES), which are enumeration problems admitting exact algorithms [CK05].
منابع مشابه
Probing a continuum of macro-molecular assembly models with graph templates of complexes.
Reconstruction by data integration is an emerging trend to reconstruct large protein assemblies, but uncertainties on the input data yield average models whose quantitative interpretation is challenging. This article presents methods to probe fuzzy models of large assemblies against atomic resolution models of subsystems. Consider a toleranced model (TOM) of a macromolecular assembly, namely a ...
متن کاملArchitecture of the symmetric core of the nuclear pore.
The nuclear pore complex (NPC) controls the transport of macromolecules between the nucleus and cytoplasm, but its molecular architecture has thus far remained poorly defined. We biochemically reconstituted NPC core protomers and elucidated the underlying protein-protein interaction network. Flexible linker sequences, rather than interactions between the structured core scaffold nucleoporins, m...
متن کاملCollimator-detector response compensation in molecular SPECT reconstruction using STIR framework
Introduction:It is well-recognized that collimator-detector response (CDR) is the main image blurring factor in SPECT. In this research, we compensated the images for CDR in molecular SPECT by using STIR reconstruction framework. Methods: To assess resolution recovery capability of the STIR, a phantom containing five point sources along with a micro Derenzo p...
متن کاملImpact of reconstruction method on quantitative parameters of 99mTc-TRODAT-1 SPECT
Introduction: Quantitative evaluation is recommended to improve diagnostic ability and serial assessment of dopamine transporter (DAT) density scans. We decided to compare the ordered subsets expectation-maximization (OSEM) with filtered back-projection (FBP), and to investigate the impact of different iteration and cut-off frequencies on SBR values. Methods</stro...
متن کاملEvaluation of the potential impact of reconstruction method on dyssynchrony parameters derived by phase analysis of gated-SPECT MPI: Comparison of two quantitative software
Introduction: Gated SPECT myocardial perfusion scanning has new capabilities in addition to its main applications such as left ventricular dyssynchrony using phase analysis. Phase analysis has been investigated through various software including Emory Cardiac Toolbox (ECTb) and Quantitative Gated SPECT (QGS). The aim of this study is to evaluate the effect of reconstruction par...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011